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Abstract —We consider the problem of estimating the partition 
function of the ferromagnetic g-state Potts model. We propose an 
importance sampling algorithm in the dual of the normal factor 
graph representing the model. The algorithm can efficiently 
compute an estimate of the partition function when the coupling 
parameters of the model are strong (corresponding to models at 
low temperature) or when the model contains a mixture of strong 
and weak couplings. We show that, in this setting, the proposed 
algorithm significantly outperforms the state of the art methods. 

I. Introduction 

The Potts model |[T| plays an important role in many areas, 
including statistical physics 0, image processing a, and 
graph theory 0. The fundamental quantity of interest in the 
model is the partition function. For example, in Bayesian 
model selection, the partition function represents the marginal 
likelihood (the evidence) fS] Chapter 5], in statistical physics, 
the internal energy follows from differentiating the log par¬ 
tition function E Chapter 2], and in deep belief networks, 
estimates of this quantity are needed in maximum likelihood 
learning |j2l. In general, the exact computation of the partition 
function is intractable, as it requires a sum with an exponential 
number of terms. Although the planar Ising model (Potts with 
binary variables) without an external field is not difficult jS], 
the general 2D g-state Potts model without an external field 
is already computationally hartj^ 191. 

In complex models, quantities of interest can be estimated 
via Monte Carlo (MC) methods (see, e.g., nni nn m). in 
the low-temperature regime, however, MC methods based on 
single spin-flips suffer from critical slowing down and have er¬ 
ratic convergence imiii]. More advanced MC methods (e.g., 
nested sampling iflTll and the Swendsen-Wang algorithm ifTSl ) 
require sampling from a large sequence of constraints or 
intermediate distributions at different temperatures to estimate 
the partition function. 

We represent the Potts model using graphical models de¬ 
fined in terms of Forney/normal factor graphs (NFG) M- 
The partition function of an NFG is related to the partition 
function of the dual NFG via the normal factor graph duality 
theorem ini [TSl . In the models that we study here, this 

'There are, however, a few exactly solved cases, including the one¬ 
dimensional (ID) Potts model (6] Chapter 5], and in thermodynamic limits, 
for antiferromagnetic Potts models at zero temperature, the two-dimensional 
(2D) 3-state and the triangular 4-state models Chapter 12]. 


theorem states that the two partition functions are equal up to 
scale. In EH, the authors showed that for the nearest-neighbor 
2D Ising model, at low temperature, MC methods converge 
faster in the dual NFG than in the primal NFG. MC methods 
based on uniform and Gibbs sampling in the dual domain were 
proposed in lfT9l l20ll to estimate the partition function. 

In this paper, we extend the previous results to propose 
an importance sampling algorithm to estimate the partition 
function of the nearest-neighbor 2D ferromagnetic g-state Potts 
model. The proposed algorithm also operates in the dual NFG 
of the model. Our analytical results prove that we can obtain 
very accurate estimates of the partition function, when the 
coupling parameters on only a subset of the edges (which 
forms a spanning tree) are strong. Our numerical results show 
that, in various settings, our algorithm improves upon the state 
of the art MC methods in the dual domain and deterministic 
methos in the primal domain by more than an order of 
magnitude. 

The paper is organized as follows. In Section]^ we review 
the Potts model and its graphical model representation in terms 
of NFGs. Dual NFGs and the normal factor graph duality 
theorem are discussed in Section [III] The dual NFG of the 
Potts model is used in Section IV to describe the proposed MC 
methods. Numerical experiments are reported in Section [V] 


II. The 2D Potts model 

Let Xi,X 2 ,...,Xiy be a collection of discrete random 
variables arranged on the sites of a 2D grid, as illustrated in 
Fig. [3 where interactions are restricted to adjacent (nearest- 
neighbor) variables. Suppose that each random variable takes 
on values in a finite alphabet X, which in this context is equal 
to the abelian group Z/gZ = {0,1,..., g — 1}. Here, g is an 
integer satisfying g > 2. 

Let Xi represent a possible realization of Xi, x stand 
for a configuration {xi,X 2 , ■ ■ ■ ,Xn), and X stand for 
{Xi, X 2 , ■ ■ ■, Xn). a real coupling parameter Jk,£ is asso¬ 
ciated with adjacent variables {xk,Xi). 

We start with the Potts model without an external field (the 
Potts model in an external field is discussed in Appendix I). 
The energy of a configuration x is given by the Hamilto¬ 
nian 0 

'H(x) = - ^ Jk,e ■ [xk = xe] (1) 

{k,i) G B 



Here, B contains all the unordered pairs (bonds) {k,i) with 
non-zero interactions, and [•] denotes the Iverson bracket im, 
which evaluates to 1 if the condition in the bracket is satisfied 
and to 0 otherwise. The parameter ^ controls the strength of 
the interaction between {xk,Xi). In this paper, we concentrate 
on ferromagnetic Potts models, characterized by J^ i > 0 for 
each (fc, t) G B. 

The probability that the model is in configuration x is given 
by the Boltzmann distribution ||2l 

-«(x)/T 

Pb(x) = -- (2) 

Here, the normalization constant Z is the partition function 
Z = and T denotes the temperature. In the 

rest of this paper, we assume T = 1. Hence, large values (resp. 
small values) of J correspond to models at low (resp. high) 
temperature. 

For each adjacent pair (xkjXg), let Kk/ : K>o 


Kfc7(a:fc,a:f) = 

(3) 

We can then define / : X^ K>o as 


/(x) = 0 Kk,l{Xk,X() 

(4) 

(kj) G B 


From Q, Z in Q can be expressed as 


E /w 

(5) 


xGX" 


The factorization in 0 can be represented by an NFG, 
in which nodes and edges represent factors and variables, 
respectively. The edge representing variable x is connected 
to the node representing factor k{-) if and only if x is an 
argument of k{-). If a variable (an edge) appears in more 
than two factors, such a variable is replicated via an equality 
indicator factor nil Ea. The equality indicator factors are 
denoted by $=(•). which impose the constraint that all their 
incident variables be equal. 

The NFG for the factorization in Q is shown in Fig. 
where the boxes labeled “=” are equality indicator factors. For 
variables X 2 , X 2 , X 2 in Fig. the equality indicator factor 
is as 

^={X 2 , x' 2 , X 2 ) = [x 2 =X 2 = X 2 ] (6) 

which evaluates to 1 if all its arguments are equal, and to 0 
otherwise. 

Throughout this paper, boundary conditions are assumed to 
be periodic. Thus, each equality indicator factor has degree 
four and \B\ — 2N. For simplicity, periodic boundary condi¬ 
tions are not shown. 

At high temperature (i.e., small J), the Boltzmann distribu¬ 
tion Q approaches the uniform distribution. To estimate Z in 
this case, MC methods in the primal NFG, as in Fig.[T] perform 
well. In more challenging situations (e.g., at low temperature 
where 0 is highly non-uniform), we propose an importance 
sampling algorithm in the dual NFG of the 2D Potts model to 
estimate Z. 



Fig. 1; The NFG of the 2D Potts model, where unlabeled boxes 
represent (|3]l and boxes containing “=” symbols are as in (|6|. 



Fig. 2; The dual NFG of the 2D Potts model, where the 
unlabeled boxes represent factors 0 and boxes containing 
“-I-” symbols are as in (|^. 

III. The dual NFG 

We briefly discuss the normal factor graph duality theo¬ 
rem Esiiniiiiiiia, which relates Z to the partition function 
of the dual NFG, denoted by Z^. We will also explain a 
procedure to obtain the dual NFG of the 2D Potts model. 
This procedure preserves the value of Z up to scale. 

We will use the tilde symbol to denote variables in the 
dual domain. Note that X also takes on values in X. 
Here, '-f{xi,X 2 ), the 2D discrete Fourier transform (DFT) of 
k(xi,X 2 ), is as 

7(Si,i2) = ^ ^ ^ 

I I X1&XX2&X 

(7) 

where i is the unit imaginary number ll24l . 

To obtain the dual of Fig. [T] we first replace each factor 
Hk,e{xk,xi) with its 2D DFT ‘jk,e{xk,S:e), surrounded by two 




































































































factors and . These complex-valued factors represent 
the inverse Fourier transform. We then merge each of these 
factors with the corresponding equality indicator factor, i.e., 
is merged with the equality indicator factor incident to 
the edge representing Xk, and with the equality indicator 
factor incident to x^. 

After applying the above procedure to all the factors in 
Fig. □ we obtain the dual NFG of the 2D Potts model, in 
which factors Q are replaced by their 2D DFT, and equality 
indicator factors by their inverse DFT. 

The 2D DFT of Q has the following form 

{ gA.f q_i^ jf _ Q 

- 1, if Xk + Xi = 0 (8) 

0, otherwise. 

The inverse DFT of an equality indicator factor (with degree 
four) is a mod^ indicator factor (also with degree four) - 
up to scale factor q. The mod^ indicator factors are denoted 
by $+(•). They impose the constraint that all their incident 
variables sum to zero (modulo q). 

The dual NFG of the 2D Potts model is illustrated 
in Fig. where the unlabeled boxes represent factors ([^ and 
boxes labeled “+” are mod^ indicator factors. For variables 
X 2 ,X 3 ,X 4 in Fig. 1^ the mod^ indicator factor is as 

$+(^ 2 , 53 , 54 ) = [i2+i3+^4 = 0] (9) 


which evaluates to 1 if the sum of its arguments is zero, and 
to 0 otherwise. 

After multiplying the N local scale factors that appear in 
the dualization procedure, we obtain 

= q^Z (10) 


for more details, see M Theorem 21. ^7)1. 

In the following, we introduce a convenient modification 
to Fig. 1^ the motivation will become apparent in Section [TV] 
Note that (ife, is non-zero if and only if Xk+X£ = 0. In 
other words, (j^ can equivalently be written as ^k/{xk — xi)- 
Therefore, by inserting a sign inverter in one of the edges 
incident to it is possible to represent this factor using 
only one variable. For notational simplicity, we express this 
factor as 7 fc(ifc). 

We can thus modify Fig. to construct Fig. with factors 
attached to each equality indicator factor 


lk{Xk) 


gA _|- g _ if aifc = 0 

— 1, otherwise. 


( 11 ) 


where Jk is the coupling parameter associated with each edge 
(bond) or “the bond strength”. 

The modified dual NFG is shown in Fig. where the 
unlabeled boxes represent (111. In analogy with the logic 
NAND gate, the sign inverters are depicted by small circles 
attached to one side of the equality indicator factors; however, 
the choice can be made arbitrarily. See in for a similar 
approach in the context of non-binary codes (see also ESlEol 
for the dual NFG of the Potts model). 



Fig. 3: Modified dual NFG of the 2D Potts model, where the 
unlabeled boxes represent factors (111 and the small circles 
denoted by “o” are sign inverters. 



Fig. 4; A partitioning of the variables in Fig. where the thin 
edges represent and the remaining thick edges represent 
Xs. Here, X^ is a linear combination of X^i. A valid 
configuration in a 3-state Potts model is also illustrated. 


In Section[rv] we show that Fig.j^can be used as an efficient 
calculational tool to compute MC estimates of which can 
then be used to estimate Z via the normal factor graph duality 
theorem, cf. (lOl. In this paper, the focus is on ferromagnetic 
models, and as a result, factors ([^ and (111 are nonnegative. 
We require all factors to be nonnegative as we need to define 
probability mass functions in Fig. for our MC methods. 

In the sequel, we will refer to Fig. as the dual NFG or 
simply as the dual of the 2D Potts model. 


IV. Monte Carlo methods 

We describe our MC methods in Fig. where we partition 
the set of variables X into X^ and X^, with the condition that 



















































Xb is a linear combination (involving mod^ indicator factors) 
of X 4 . In this set-up, X^ forms a spanning tree, and a valid 
configuration in Fig. can be created by assigning values to 
Xyi, followed by computing/updating X^. 

An example of such a partitioning, with a numerical ex¬ 
ample of a valid conhguration in a 3-state Potts model, is 
illustrated in Fig. where Xyi is the set of all the variables 
associated with the thin edges (thin bonds) and Xg is the set 
of all the variables associated with the remaining thick edges. 
Accordingly, let Ba C B contain the indices of the variables 
in X 4 , and Bb = B — Ba- 

The importance sampling algorithm works as follows. To 
draw at iteration i, we hrst draw a sample x^^ ac¬ 
cording to an auxiliary (proposal) distribution in the dual 
NFG. We then update x® to create a valid configuration 
= (x^\xg^). Updating x^^ is easy as x^ is a linear 
combination of X 4 . 

Let us dehne 

r4(x4) = 7fc(ifc) (12) 

keBA 

rB(xB) = 7 fe(ifc) (13) 

keBs 

In our importance sampling algorithm, we use the following 
probability mass function as the auxiliary distribution 

<Zd(xA) = Vx^eA’I®-! (14) 

^qd 

where \Ba\ denotes the cardinality of Ba- 

The partition function is analytically available as 

Zq^ — r^(XA) 

XA 

= n 

k^BAt—0 

= gl^^lexpl^ ^ Jfc) (15) 

keBA 

It is also straightforward to draw independent samples 
x^^^, x^^,... ,..., x^^ according to qd{iA)- The prod¬ 

uct form of |l 2 | i indicates that to draw x^^ we can apply 
Algorithm 1, where in line 3, (l -f (q — l)e“‘^'‘)/q is equal to 
7fc(0)/E?=i lk{t)- 


Algorithm 1 Drawing x^^ according to qa^KA) 


1: draw ,..., u^^Ba\ 

2 : for /c = 1 to \Ba\ do 


U[QA] 


if u 


(i) 


< 


1 + {q- l)e 


then 


X^A,k = 0 


else 


draw x^Ak randomly in {1, 2,..., q — 1} 

end if 
end for 


(i) 

After drawing x ^7 we update x^^ to create a valid con¬ 
hguration xl^l = (x^\x^^). Algorithm 2 then uses these 
samples to estimate Z^- It is easy to verify that Zis in (17 1 is 
an unbiased estimator of Zd- Indeed 


Eijd [ -^is ] = Zd 


(16) 


Algorithm 2 Estimating Zd 

1 : for £ = 1 to L do 

2: draw x^^ according to qd(xyi) 

3: update x^ 

4: end for 

5: compute 

= (17) 

^=1 


The accuracy of depends on the Huctuations of 

rB(xB). If Tsi^B) varies smoothly, Zis will have a small 
variance. From (jTT]i and ( [T3] ), we expect to observe a small 
variance if is large for k G Bb- See our analysis of the 
variance in Section IIV-AI 

An algorithm based on uniform sampling, as applied in lfT9l 
I 20 I . can be obtained by drawing x^^ uniformly and indepen¬ 
dently from i.e., according to 

Ud(xA) = VXA G ( 18 ) 

where 

(£) 

After updating x^ , we will use the created conhgurations 

in 

^Unif = ^Y.^A{if)rB{if) ( 19 ) 

i=l 

It is easy to verify that, E„j[Zunif] = Zd- 

The performance of the uniform and importance sampling 
algorithms will be close in model at very low temperature 
(i.e., for large J^). Flowever, for a wider range of parame¬ 
ters, importance sampling outperforms uniform sampling (see 
Section [Vl l. 


A. Analysis of the variance 


We present a heuristic argument for the accuracy of our MC 
methods. Let us replace ([^ by 


7/c(ifc) = 


(X 


-f q- 1)(^ 

/ 6'^'= — 1 N 

V e A -f q — 1 / 


gA _ X 
gA _|_ g _ 1 
i-[£fc=0] 


) 


l-[Sfc=0] 


( 20 ) 

( 21 ) 


The required factor S to recover Zd can be computed as 
S = HfeGB (e'^" +q-l). 

Note that pT] ) tends to a constant factor as Jk grows, which 
gives reasons for the fast convergence of the uniform sampling 
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1 2 
Coupiing Parameter 

Fig. 5; The relative error | In Z — In Z|/ In Z vs. the coupling parameter for an 8 x 8 3-state Potts model; (left) constant coupling 
with zero local field; (middle) mixed coupling, where we set = 2 for k G Bb, and for k G Ba, we set Jj. according to the 
value on the x-axis; (right) mixed couplings with a weak external held. 


algorithm in this case (cf. (12i, and ([T9|)), whereas 

the importance sampling scheme is guaranteed to have fast 
convergence when the coupling parameters associated with 
the thick edges are strong (cf. Gl and ([T7}). Indeed, in 
contrast to the convergence of MC methods in the primal NFG, 
convergence of uniform sampling improves as becomes 
larger for each k G B. Convergence of importance sampling 
improves as Jk becomes larger for each k G Bb- 

More rigorously, let us denote the global probability mass 
function in the dual factor graph by Pd(')- Notice that Pd{-) 
and qd(') in (14i are both dehned in the same conhguration 
space 

We can therefore write pd(') as a function of 


Pd(xA) = 


rA(xA)FB(xB) 


^d 


( 22 ) 


z„ 


Zis in (17 1 as 


Var[ Zis ] = I 2 ( E [Z,|] - (E [Zis]) (24) 


^ (t)^ 


E,jr|(XB)]-l (25) 


- 1 


_ pH^a) 

^ '7d(xA) 

= X^(Fd(xA),gd(xA)) 


(26) 


(27) 


where x^{'g) denotes the chi-squared divergence, which is 
always non-negative, with equality to zero if and only if its 
two arguments are equal ll27l Chapter 4]. 

For simplicity, let us assume that for k G Bb the cou- 


limjs —>■00 Fd(xA) = gd(xA)- Hence 


lim x^(Pd(xA),gd(xA)) = 0 

Js-^OO 


(28) 


We conclude that Zd can be estimated efficiently using the 
importance sampling estimator when, for k G Bb, the coupling 
parameters of the model are strong . 

Similarly, one can show 

J^Var[Zunif] = X^(Fd(xA),Ud(xA)) (29) 
Let us assume that all the coupling parameters are constant. 


denoted by J. From (21 1 , limj_>,oopd(xA) = Udi^A), there- 


= -^qd{iA)TB{iB), ViA G (23) 

Following EH ns, from ([2^1 we express the variance of 


pling parameters are constant, denoted by Jb- From (21 1 , 


fore, uniform sampling can efficiently estimate Zd in models 

at very low temperature. 

B. Further remarks 

We finish this section with a few additional remarks. 

i) Given the partitioning, the computational complexity of 
our algorithms is 0{N) per sample, which makes the total 
computational complexity 0{NL). 

ii) Successive applications of our algorithm can yield all the 
marginals of as ratios of two partition functions, with 
complexity 0{N'^L). 

iii) The choice of partitioning in the dual NFG is arbitrmw as 
long as Xs forms a spanning tree in Fig. From (28 1 , a 
good heuristic strategy for partitioning in models with a 
mixture of strong and weak couplings is to include edges 
associated with stronger couplings in X^. 

iv) If the couplings are relatively strong, we can employ 
annealed importance sampling ll28l (see Appendix II). 
In the mid-temperature regime, the performance of MC 
methods in the dual NFG should be compared to the 
performance of MC methods applied directly to the primal 
NFG, as in Fig. [T] 
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v) For infinite-size models (i.e., N ^ oo), we need to 
analyze the variance of estimating \n{Z)/N, because Z 
itself becomes unbounded. For more details, see OS. 

vi) Incidentally, the dual NFG gives an alternative deriva¬ 
tion of the known analytical solution to the partition 
function of the one-dimensional (ID) Potts model (see 
Appendix III) 

V. Numerical experiments 

In this section, we first consider tractable models amenable 
for exact computation and then apply our MC methods to 
larger instances, for which the exact computation is no longer 
feasible. 

Tractable models; We consider instances of 8 x 8 3- 
state Potts models with periodic boundaries, the largest size 
solvable via the junction tree algorithm implemented in 1^ . 
For comparison, we compute the relative error in estimating 
the log partition function, i.e., \\iiZ — lnZ|/lnZ, vs. the 
coupling parameter, for the importance sampling and uniform 
sampling algorithms in the dual NFG, and the belief propaga¬ 
tion (BP), the generalized BP (GBP), and the tree expectation 
propagation (TREEEP) algorithms in the primal NEG. These 
algorithms are the state of the art deterministic methods that 
perform best in our set-up among all the variants implemented 
in im. 

In GBP (GBPLoop), one cluster is used for each cycle of 
length four, and one for each maximal factor. In importance 
sampling and uniform sampling algorithms, we average over 
50 trials, each with L = 10® samples, which takes about two 
minutes on a 2GHz Intel Xeon CPU. 

Pig. 1^ (left) shows the results for grids with constant 
coupling. At relatively high temperature (for J < 1), MC 

methods in the dual NEG perform poorly, while GBP gives the 
best results. In contrast, for J > 1, importance sampling and 
uniform sampling both perform well. In particular, importance 
sampling outperforms by more than an order of magnitude 
the second best approaches - uniform sampling and TREEEP 
(which perform comparably). 

The effect of having mixed couplings is illustrated in Pig. 
(middle), where for k G Bb we set Jj. = 2, and for k G Ba 
couplings are fixed according to the value on the T-axis. We 
observe that importance sampling outperforms all the other 
approaches in a wider range of couplings (for J > 0.3), 
while BP and GBP give very poor results. 

In Pig. 1^ (right), we show the effect of having a constant 
external field H — 0.1 (see Appendix I) and mixed couplings 
as in Pig. (middle). In this case, importance sampling 
performs better than the deterministic algorithms at very low 
temperature (for J > 1.5) while GBP and uniform sampling 
perform poorly. 

Large grids: We show simulation results for the log parti¬ 
tion function per site, i.e., ^ In Z, vs. the number of samples 
for one representative instance of the model with bond- 
dependent couplings. In other words, couplings are drawn 



Number of Samples 

Pig. 6: Estimated ln(X) /N vs. number of samples using IMP 
(solid black lines) and UNI (dashed blue lines) in the dual 
NEG of a 40 X 40 3-state Potts model with ~ if [2.5, 3.0] 
for k G Ba and Jk ~ if [2.5,3.0] for k G Bb- 



Pig. 7: Everything as in Pig. but with ~ if [2.0, 2.5] for 
k G Ba- 


randomly; but remain fixed in the simulation^ 

We first consider a 3-state Potts model of size A^ = 40 x 40, 
and fix Jk if [2.5,3.0] for k G Bb- Por k G Ba, we set 
Jk if [2.5, 3.0] and Jk if [2.0, 2.5] in the first and in the 
second experiments, resp. Simulation results obtained from 
IMP (solid black lines) and UNI (dashed blue lines) in the 
dual NEG are shown in Pigs. and Por very large coupling 
parameters, convergence of the algorithms is comparable, 
see Pig. However, for a wider range of couplings, uniform 
sampling has issues with slow convergence, see Pig. 

We then consider a 4-state Potts model with size 
W = 30 X 30, and fix Jk ff [0.75,2.25] for k G Ba- We set 
Jk ff [2.25, 3.25] in the third experiment (see Pig. 81 and 

Jk ff [3.25,4.25] in the fourth experiment (see Pig. 9i, for 


^In the jargon of statistical physics, estimating quantities for a fixed set of 
couplings (generated according to some distribution) is called the “quenched 
average”. For more details, see do] Chapter 2]. 
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Fig. 8: Estimated hi{Z)/N vs. number of samples using IMP 
in the dual NFG of a 30 x 30 4-state Potts model with Jk 
U[0.75, 2.25] for k G Ba and Jk ~ U[2.2b, 3.25] for k G Bb- 



Fig. 9: Everything as in Fig. but with Jk if [3.25,4.25] 

for k G Bb- 


k G Bb- Simulation results obtained from IMP in the dual 
NFG are shown in Figs. and The estimated ln(Z)/A^ 
are about 4.223 and 5.2215, resp. In accordance with our 
analysis in Section IV-A| convergence of the importance 
sampling algorithm improves as Jk becomes stronger for 
k G Bb - Remarkably, for this size of grid and with such strong 
couplings, as in Fig. [9| good convergence is achieved using 
only a few thousands samples. 


VI. Conclusion 

We presented an importance sampling algorithm in the 
dual NFG of the 2D Potts model to estimate the partition 
function. In contrast to MC methods in the primal domain, 
convergence of our algorithm improves when the coupling 
parameters (on only a subset of the edges) become stronger. 
Furthermore, the method outperforms the state of the art 
deterministic algorithms in the primal domain, as well as 
uniform sampling methods in the dual domain, by more than 
an order of magnitude in terms of accuracy. 


Since the dual NFG retains the topology of the primal NFG, 
we expect our algorithm to perform well in graphical models 
with arbitrary topology - especially at low temperature. Ex¬ 
tending our methods to more general settings and applying 
variational inference in the dual NFG are the focus of our 
current research. 
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Appendix I 

The 2D Potts Model in an External Field 

In this section, we describe an importance sampling algo¬ 
rithm to estimate the partition function of the 2D g-state Potts 
model in the presence of an external field. 

In this case, the energy of a configuration x is given by the 
Hamiltonian 

N 

"^(x) = - ^ Jk,e ■ [Xk [Xm = 0] (30) 

(k,e) G B m=l 

where the real coupling parameter ^ controls the strength 
of the interaction between {xk, xe) and the real parameter Hm 
corresponds to the presence of an external field. 

In ( [30| ), we have assumed that the external field applies 
only if Xm = 0. The Hamiltonian can be defined in other ways, 
e.g., the field can apply when Xm = 1 or when Xm is in more 
than one state. 

As in for each adjacent pair {xk,Xi), we let 

KkAxk,xe) = e^'^’‘-^^'^=^^^ (31) 


and for each Xm, let : A' 


^>0 




Hrr 


We then define / : A' 


N 


^>0 as 


.=0] 


N 


(32) 


I (^m) 


(33) 


/(x) = ff KkAxk.Xl) 

G B m=l 

The NFG of the 2D Potts model in an external field (i.e., the 
factorization in ( |3?| l) is shown in Fig .[T0l where the unlabeled 
normal-size boxes represent factors ( |31| l and the small boxes 
represent factors (32i. 

Again, we are interested in estimating the partition function 
Z, which is expressed as 




E 


/W 


(34) 


N 


The dual NFG of the 2D Potts model is illustrated in Fig. 11 


where factors attached to each equality indicator factor are as 
in (11 1 , and factors attached to each mod^ indicator factor are 
given by 


+q - l)/q, 
- l)/g, 


if Xm — 0 
Otherwise, 


( 35 ) 










































Fig. 10; NFG of the 2D Potts model in an external field, 
where unlabeled normal-size boxes represent factors 
small boxes represent factors (321, and boxes containing “=” 
symbols are equality indicator factors. 


Fig. 11; Dual NFG of the 2D Potts model, where the unla¬ 
beled normal-size boxes represent ( [TT] l, unlabeled small boxes 
represent (351, and the small circles denoted by “o” are sign 
inverters. 


which is the ID DFT of i 

We focus on ferromagnetic models (i.e., Jk > 0) in a 


nonnegative external field (i.e., Hm > 0), therefore, (35i will 
be nonnegative. 

We can design an importance sampling algorithm in Fig. 11 
to compute an estimate of Z^. Again, we need to partition 
X into Xyi and X^, with the condition that X^ is a linear 
combination (involving the XOR factors) of X.4. Indeed, the 
same partitioning presented in Fig. is applicable. We only 
need to add edges (variables) representing the external field to 
the thin edges, i.e., to Xyi. 

For a valid configuration x = (x^,xb), we suppose 
Xy4 = (y, z), where y contains all the thin edges attached to 
the small unlabeled boxes (which represent variables involved 
in factors (|3^), and z contains all the variables associated 
with the thin bonds (which represent variables involved in 
factors O)- 

We will need the following lemma. 


Lemma 1. If x is a valid configuration in the dual NFG, then 


N 

Y.ym=0 (36) 

m—1 

Proof. We consider the sum of all the N components of y 
as, c = y-m- Each mod^ indicator factor imposes the 

constraint that all its incident variables sum to 0. In c, each 
ijm can thus be expanded as the sum of the corresponding 
variables associated with the bonds. Furthermore, the variables 
on the bonds each appear twice in this expansion, once with a 
positive sign and once with a negative sign. We conclude that 
c = 0. ■ 

Lemma 1 asserts that in a valid configuration x there is 
a linear dependency among the components of y. Therefore, 


we need to exclude one of the edges attached to the small 
unlabeled boxes from X.4. Without loss of generality, we 
assume that the excluded edge is involved in Xn{-). 

We then define 

N-l 

T'(xa) = 7fc(ifc) Xm{Xm) (37) 

m—1 

A(xb) = lk{xk) (38) 

keBn 

We use the following auxiliary probability mass function in 
our importance sampling algorithm 

= (39) 

where the partition function is analytically available as 


XA 




keBA 


Jk 


N-l 

E 

m—1 


Hr, 


(40) 


(41) 


The product form of (371 indicates that two separate sub¬ 
routines are required at each iteration £, in order to draw 
= according to (39i; one to draw the y^^A 


-M) 


part and the other to draw the z'^^Iq)art. 

Drawing the z*^^Epart can be done using Algorithm 1. To 
draw the y^^Epart, we apply the following. 

Inline 3, + /q is Xm{0) /^tZi and 

(i) 

setting jijY in line 9 is done according to (361. 

After generating x^^ = (yi^l, z^^l), we update x^^ in order 
to create a valid configuration xi^l = (x^\ x^^). The samples 
are then used to compute an estimate of Z^. MC estimates of 










































































Algorithm 3 Drawing the y^^^-part 


1: draw u{\u 2 \ ■ • • i 1] 

2 : for TO = 1 to -/V — 1 do 

1 + (g - l)e“^™ 


if Urn < 


y^^') — Q 
ym — 


then 


else 


draw ijm randomly in {1, 2 ,..., g — 1} 

end if 
end for 

9: set = Em=i y™ mod q 


model as in Fig. 13 where the unlabeled boxes represent 
factors as in and boxes containing “=” symbols are 

equality indicator factors. 



Fig. 12: The NFG of the ID Potts model with = 4 and 
with periodic boundary conditions. 

Note that in the ID case, the inverse Fourier transform of 
each equality indicator factor (with degree two) becomes a 
modq indicator factor with scale factor one. 


Zd are then used to estimate Z via the normal factor graph 
duality theorem. 


Appendix II 

Annealed Importance Sampling in the Dual NFG 



Fig. 13: Dual NFG of the ID Potts model. 


Annealed importance sampling is used in the primal NFG by 
moving from a tractable (high temperature) distribution to the 
target distribution via a sequence of intermediate distributions. 

We briefly discuss how to employ annealed importance 
sampling in the dual NFG to estimate the partition function, 
when Jfc is relatively strong for k G Bb- For simplicity, let us 
assume that the coupling parameters associated with the thin 
edges and the coupling parameters associated with the thick 
edges are constant, denoted by Ja and Jb, respectively. 

We thus denote the partition function by Zi{JA, Jb), and 
express Zi{JA, Jb) using a sequence of intermediate partition 
functions by varying Jb in V levels as 


Zd{JA, Jb) 


v-i 


z,{ja,jt) n 


Zi{JA,JB^n 


(42) 


Here, unlike typical annealing strategies in the primal 
domain, {ao,ai,... ,av) is an increasing sequence with 
1 = ao < cki < • • • < ctv, see ll28l Section 2]. 

If av is large enough, Zd(JA,Tg'^) can be estimated 
efficiently via our proposed importance sampling algorithm. 
As for the intermediate steps, a sampling technique that leaves 
the target distribution invariant (e.g.. Metropolis-Hastings al¬ 
gorithm) is required at each level. These intermediate target 
probability distributions correspond to the intermediate parti¬ 
tion functions. Also, V should be sufficiently large to ensure 
that intermediate target distributions are close enough. 


Appendix HI 

Partition Function oe the ID Potts Model 

We use the normal factor graph duality theorem to compute 
the partition function of the ID g-state Potts model of size 
N, and with periodic boundary conditions. The NFG of the 


model is shown in Fig. 12 where boxes represent factors as 
in 1^ and edges represent the variables. 


Following our approach in Section III and after some 
modihcations, we can obtain the dual NFG of the ID Potts 


There are only q valid sequences in Fig. 13 namely, the 
all-zero sequence, the all-one sequence, etc. We can therefore 
compute Zd directly as 


N N 

Z,^l[{e'^>‘+q-l) + {q-l)l[{e-^'‘-1) (43) 

k=l k=l 

Since all the local scale factors are one, Z = Z^. 

Let us suppose that the coupling parameters of the model 
are constant, denoted by J. In thermodynamic limits of large 
TV, we obtain 

lim = ln(e'^ + g - 1) (44) 

In statistical physics, the transfer matrix method is usually 
employed to compute the partition function of classical ID 
models. See m and IE Chapter 5] for more details. 
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